Geospatial Data in R

Week 1 - Review of R and R Markdown

Prof Josh Merfeld

August 9, 2024

Introduction

Syllabus and contents

My slides

  • Before we get into it, I have put all of my material on the web

  • You can find my slides (along with copy-pasteable code) on my GitHub repository:

    • https://github.com/JoshMerfeld/geospatialdataR
    • Scroll down to the bottom and you’ll find all of the links you’ll need.
    • I’d suggest you have this page open during classes. I’ll sometimes ask you to use something on the repo.

Let’s get started!

  • Let’s get started with the hands-on session.

  • As a first step, I’d like to hear from all of you:

    • How much experience do you have using R?
    • This is all experience, not just with SAE

Some things to note

  • We will be using RStudio throughout the workshops
    • There are other options you are welcome to use (VS Code is the most common alternative)
  • Two general “data cleaning” pipelines:
  • We will be using the tidyverse

Getting started with RStudio

  • Let’s start by looking at the layout of RStudio.

  • For those of you with ample R experience, nothing here will be new!

Why don’t you all give it a try

  • Create a script in RStudio

  • Save that script in a specific place (folder) on your computer

    • Make sure to keep track of where you save it!
    • I create a folder for each specific project I work on
    • e.g. you could create “Nairobi Workshops” and save the script as “day1.R”

First things first: the working directory

  • The working directory is the folder that R is currently working in
    • This is where R will look for files
    • This is where R will save files
    • This is where R will create files
  • You can always write out an entire file path, but this is tedious
    • More importantly, it makes your code less reproducible since the path is specific to YOUR computer

First things first: the working directory

  • One nice thing about R is that the working directory will automatically be where you open the script from
    • Let’s try this. Save your script to a folder on your computer, then open the script from that folder.
    • Let’s see if it worked!
Code
getwd() # this command will show you your current working directory
[1] "/Users/Josh/Dropbox/KDIS/Classes/geospatialdataR"

First things first: the working directory

  • You can also set the working directory in RStudio
    • Session > Set Working Directory > Choose Directory (or Source File Location)
    • Give it a try and let’s see if it worked!
Code
getwd() # this command will show you your current working directory
[1] "/Users/Josh/Dropbox/KDIS/Classes/geospatialdataR"

Always use the same working directory!

  • Make sure to always set the working directory to the same location when working in the same script!

  • This will avoid problems later

    • It also makes your code more reproducible (e.g. if a colleague wants to run it, you just send the entire folder and it works with no changes)

R packages

  • R is a language that is built on packages
    • Packages are collections of functions that do specific things
    • R comes with a set of “base” packages that are installed automatically
  • We are going to use one package consistently, called the “tidyverse”
    • This consists of a set of packages that are designed to work together, with data cleaning in mind

R packages

The one exception to always using a script? I install packages in the CONSOLE. You can install packages like this:

Code
install.packages("tidyverse") # this will install the tidyverse package. Note the quotes!
  • You only need to install a package once on your computer.

R packages

The first thing you’ll do in your script is load packages. You do it like this:

Code
'''
This script is part of the Nairobi Workshop on SAE.
Date: 26 August 2024 (written earlier!)
Author: Josh Merfeld
'''
# Load packages (libraries)
library(tidyverse)
  • Note that the first part is a comment I’ve added to the script.
    • I make a lot of comments!